b'                            National Archives and Records Administration\n                                                                                                      860] Adelphi Road\n                                                                                     College Park, Maryland 20740-600]\n\n\n    Date \t      April23,2010\n   Reply to\n\n   Attn of \t    Office ofInspector General (OIG)\n\n   Subject \t    Management Letter No. 10-10, Concerns with the Electronic Records Archives System\'s\n                Ability to Conduct Full-Text Searches\n\n   To \t         David Ferriero, Archivist of the United States (N)\n\n                The purpose of this management letter is to define our concerns as to the capacity and\n                capability of the Electronic Records Archives (ERA) system to search the records which it\n                will eventually store. The ERA Requirements Document (RD) dated October 17,2005\n                defines the systems core requirements, and RD Sections 19 and 20 (attached) focus on the\n               search and access capabilities of ERA. Based upon our interpretation, the RD calls for a\n               system which at Final Operating Capability (FOC) would ingest, preserve and facilitate\n               authorized user search, retrieval and access to all data which presided in each record\n               maintained in ERA. For example, we believe ERA should be able to search the full text of\n               an email\'s body, not just the subject line or delivery addresses. It has come to our attention\n               NARA program officials may decide to limit ERA\'s searching functions short of full-text\n               searches 1 due to the costs involved. With FOe looming in 2012, NARA has yet to make this\n               crucial decision, nor have they alerted appropriators of the resource issues involved.\n\n               The ERA Program Director was asked if ERA users would have full-text search capability\n               for all data ingested into ERA, and if that was the case, did NARA have the equipment and\n               funding necessary either in place or requested to accomplish it. In response the ERA\n               Program Director stated:\n\n                      Full text search is a capability that is built-in with the Vivisimo Search\n                      Engine, the search engine selected for the ERA. The question is does it make\n                     sense to do full text searching on every record stored in ERA? Even if limited\n                     to a specific data type, such as text, this would be very resource intensive both\n                     in terms of the size of the index and in compute cycles to generate and\n                     maintain the indexes and to execute queries. This is a policy decision that will\n                     be made by NARA. According to our engineering staff, indexing for full text\n                     searching requires about a 3x increase in storage. These are implementation\n                     questions that will be answered during the next 8-12 months as the analysis,\n                     design and development of the production public access piece of ERA\n                     progresses.\n\n\n\n\nI A search tlolat compares every word in a document, as opposed to searching an abstract or a set of keywords ass.ociated\nwith the document.\n\x0c When directly asked ifNARA did decide to do full text searching, was the funding already\n built into the current contract increment\'s development cost (increment 3) or would funding\n have to be built into the next increment (increment 4), the ERA Program Director stated:\n\n        The funding would come from increment 4 and/or increment 5. Beyond FOe,\n        as the archive continues to grow, it will be necessary to add more storage as\n        necessary.\n\n Thus our understanding is that, while the ERA system will employ a publically-available\n search engine which should have full search capability, at this time no determination has\nbeen made as to whether this search engine will be able to actually search all of ERA\'s data.\nRather, this overarching question is still on the table for perhaps another full year even\nthough we have communicated to our stakeholders that FOe will arrive in March 2012. If\nthe decision is made to enable full-text search and access capabilities, NARA would require\nsignificant additional funding and resources prior to FOe. NARA officials were unable to\nprovide the OIG with any funding requests in this regard. This decision may have an impact\non funding for increments 4 and 5, as well as out years, or may even result in the need for a\nfollow-on contract. Lacking additional funding streams, per NARA\'s own definition, we\nwould be unable to support full-text search of our electronic holdings. Rather, ERA users\nwould be faced with the prospect oflimited search capability and diminished capacity to\nnavigate through NARA\'s vast holdings.\n\nWith FOe and the end of the ERA development contract looming, NARA management \n\nneeds to make this critical decision in a timely manner and communicate it to our \n\nstakeholders. If you have any questions, or require additional information, please do not \n\nhesitate to contact me. \n\n\n\n\n///\xc2\xa3"/~">\n    .~, / .~---1/.\nPaul Brachfeld\nInspector General\n\n\ncc: NH (M. Morphy)\n\nAttachment: \n\nERA Requirements Document excerpt, dated October 17, 2005, 4 pages \n\n\x0c                                                                            Attachment 1\n\n\n     ERA Requirements Document Sections 19 and 20 Related to Search and Access Capabilities\n\n\n    ERA19 The system shall provide the capability to search the assets it contains\n    ERA 19.1 The system shall provide the capability for the user to select the characteristics of a \n\n    search against the assets it contains \n\n    ERA 19.1.1 The system shall provide the capability for the user to enter the criteria for the \n\n    search \n\n    ERA 19.1.2 The system shall provide the capability to search by geographic information \n\n    ERA 19.1.3 The system shall provide the capability to search by subject \n\n    ERA 19.1.4 The system shall provide the capability to search by time period \n\n    ERA 19.1.5 The system shall provide the capability to search by accession number \n\n   ERA 19.1.6 The system shall provide the capability to search by transferring entity \n\n   ERA 19.1.7 The system shall provide the capability to search by government function \n\n   ERA 19.1.8 The system shall provide the capability to search by government line of business \n\n   ERA19.1.9 The system shall provide the capability to search by asset type \n\n   ERA 19.1.10 The system shall provide the capability to search by geospatial identifiers \n\n   ERA 19.1.11 The system shall provide the capability to search by any element defined in the \n\n   asset\'s template\n  ERA 19.1.12 The system shall provide the capability to search by media type \n\n  ERA 19.1.13 The system shall provide the capability to search by record type \n\n  ERA 19.1.14 The system shall provide the capability to search descriptions by description \n\n  unique identifier \n\n  ERA 19.1.15 The system shall provide the capability to search by title \n\n  ERAI9.1.16 The system shall provide the capability for keyword searching \n\n  ERA 19.1.17 The system shall provide the capability for exact phrase searching \n\n  ERA19.1.18 The system shall provide the capability for concept-based searching \n\n  ERA19.l.I9 The system shall provide the capability for Boolean searching \n\n ERA 19.1.20 The system shall provide the capability for proximity searching \n\n ERA 19.1.21 The system shall provide the capability to search based on the frequency of access \n\n to assets by other researchers doing similar searches \n\n ERA19.1.22 The system shall provide the capability for automated question-and-answer \n\n searching based on searches performed frequently \n\n ERA 19.1.23 The system shall provide the capability to search only for descriptions that refer to \n\n electronic records\n ERA 19.1.24 The system shall provide the capability to use wildcard characters in searches\nERA 19.1.25 The system shall provide the capability for searching records lifecycIe data\nERA 19.1.26 The system shall provide the capability for searching authority sources for people\nERA 19.1.27 The system shall provide the capability for searching authority sources for\norganizations\nERA 19.1.28 The system shall provide the capability for the use of designated "stop words" that\nare disregarded during searches\nERA 19.1.29 The system shall provide the capability to search by transfer\nERA 19.2 The system shall provide the capability to search for assets based on their contents\nERA 19.3 The system shall provide the capability to search through hierarchies of information\n\x0c      ERA19.3.1 The system shall provide the capability to navigate from a description to an\n      individual electronic record\n      ERA 19.3.2 The system shall provide the capability to navigate from an individual electronic\n      record to an associated description\n      ERA 19.3.3 The system shall provide the capability to navigate from a description to a set of\n      electronic records\n     ERA 19.3.4 The system shall provide the capability to navigate from a set of electronic records\n     to a description of the set\n     ERAI9.3.5 The system shall provide the capability to navigate through all levels of records\n     lifecycle data while searching\n     ERA 19.3.6 The system shall provide the capability to navigate through all levels of sets of\n     records while searching\n     ERAI9.3.7 The system shall provide the capability to navigate from a description to a\n     description of the description\'s creator\n    ERA19.4 The system shall provide NARA-created default searches\n    ERAI9.4.1 The system shall provide the capability for users to select a NARA default search\n    from among available searches\n    ERA 19.4.2 The system shall run the user-selected NARA default search\n    ERA19.5 The system shall provide the capability for the user to select the search complexity\n    level, from simple single-variable searches to multi-variable complex searches\n   ERA19.6 The system shall provide the capability to control search run times\n   ERAI9.6.1 The system shall limit search run times in a pre-emptive manner\n   ERA19.6.2 The system shall provide the capability for users to adjust search run time limits\n   ERA19.7 The system shall provide information to the user while the search is in progress\n   ERA19.7.1 The system shall provide a search progress indicator\n   ERAI9.7.2 The system shall display the search parameters selected by the user\n   ERA19.7.3 The system shall provide an estimate to the user of how long the search will take to\n   execute\n   ERA19.7.4 The system shall notify the user when the search is complete \n\n   ERA19.8 The system shall present the search results set to the user \n\n  ERAI9.8.1 The system shall display a search results set that includes all assets meeting the \n\n  search criteria\n  ERA 19.8.2 The system shall display an explanation for the reason for withholding assets\n  ERA 19.8.3 The system shall exclude from display assets whose existence cannot be disclosed to\n  the requesting user\n ERA 19.8.4 The system shall display the total number of results in the result set returned by the \n\n search \n\n ERA 19.8.5 The system shall provide the capability for the user to select the quantity of search \n\n results to be presented in the results set\n ERA 19.8.6 The system shall provide the capability for users to select the order in which the\n result set is presented\n ERA 19.8.7 The system shall provide the capability to rank the results of the search by relevance\nERA 19.8.8 The system shall present the search results set at user-selectable levels of detail\nERA 19.8.9 The system shall indicate different versions of a record included in the search result\nset\nERA 19.9 The system shall provide the capability for a user to refine a search\nERA 19.9.1 The system shall provide the capability to search within the result set returned by the\ninitial search\nERA 19.9.1.1 The system shall provide a "more like this" capability to refine a search for more\n\n\n                                                     4\n\n\x0c      assets similar to those returned by the search\n      ERA 19.9.1.2 The system shall provide the capability to refine a search using any search criteria\n      available in the system\n      ERA 19.9.2 The system shall provide the capability to stop a search in progress in order to refine\n      the search\n      ERA 19.10 The system shall provide the capability for the user to select the assets they wish to\n     access from among the search results set\n     ERA 19.11 The system shall provide the capability to save a search\n     ERA 19.11.1 The system shall provide the capability forthe user to select a saved search from\n     their saved searches\n     ERAI9.11.2 The system shall provide the capability to run saved searches\n     ERA 19.12 The system shall provide the capability for users to store results sets over time \n\n    ERA 19.12.1 The system shall provide the capability to store search results \n\n    ERA 19.12.2 The system shall provide the capability to save selected portions of results sets \n\n    ERAI9.12.3 The system shall maintain a search results set for a specified period of time \n\n    ERA 19.13 The system shall manage mediated searches \n\n    ERA 19.13.1 The system shall provide the capability to request a mediated search \n\n    ERAI9.13.2 The system shall provide the capability for mediated searchers to dialog with \n\n    search requestors about their mediated search \n\n    ERAI9.13.3 The system shall provide the capability to manage mediated search request \n\n   responses\n   ERA 19.13 A The system shall provide the capability to prioritize mediated searches\n   ERA20 The system shall provide access to the assets it contains\n   ERA20.1 The system shall provide the capability to electronically present all electronic record\n   types\n   ERA20.2 The system shall provide the capability for users to request copies of assets \n\n   ERA20.3 The system shall provide the capability to output copies of all assets \n\n   ERA20.3.1 The system shall provide the capability to output all assets to media \n\n  ERA20.3.1.1 The system shall print address labels for media orders \n\n  ERA20.3.1.2 The system shall print packing lists for media orders \n\n  ERA20.3.2 The system shall provide the capability to print all printable assets \n\n  ERA2004 The system shall provide access to assets independently of the hardware with which \n\n  they were created \n\n  ERA2004.l The system shall provide the capability to output assets independently of the\n  hardware with which they were created \n\n ERA2004.2 The system shall provide the capability to electronically present assets \n\n independently of the hardware with which they were created \n\n ERA20.S The system sha11 provide access to assets independently of the software with which\n they were created \n\n ERA20.S.1 The system shall provide the capability to output assets independently of the \n\nsoftware with which they were created \n\nERA20.S.2 The system shall provide the capability to electronically present assets\nindependently of the software with which they were created\nERA20.6 The system shall provide the capability to access an entire electronic record\nERA20.7 The system shall provide the capability to access a set of electronic records\nERA20.8 The system shall provide the capability to access a portion of an electronic record\nERA20.9 The system shall provide the capability to access all digital components of an\nelectronic record\nERA20.10 The system shall provide the capability to output assets in fonnats selected by the\n\x0c  user from available choices\n  ERA20.1 0.1 The system shall provide the capability for users to select the output format of\n selected assets from among available formats\n ERA20.10.2 The system shall output certified copies of electronic records in formats selectable\n by the user from available choices\n ERA20.10.3 The system shall output certified copies of electronic records on media selectable\n by the user from available choices\n ERA20.! 0.4 The system shall provide the capability to output selected asset formats via\n telecommunications\n ERA20.11 The system shall maintain the authenticity of an electronic record during access\n ERA20.l1.1 The system shall maintain electronic record content during access\n ERA20.11.2 The system shall maintain electronic record specified behavior during access\nERA20.11.3 The system shall maintain electronic record context during access\nERA20.ll.4 The system shall maintain electronic record structure during access\nERA20.II.S The system shall maintain electronic record presentation during access\nERA20.11.6 The system shall provide the capability to present digital components of electronic\nrecords individually\nERA20.11.7 The system shall provide the capability to output digital components of electronic\nrecords individually\nERA20.11.8 The system shall provide the capability to present electronic records composed of\nmultiple digital components\nERA20.11.9 The system shall provide the capability to output electronic records composed of\nmultiple digital components\n\x0c'